Menu Top
Classwise Concept with Examples
6th 7th 8th 9th 10th 11th 12th

Class 10th Chapters
1. Real Numbers 2. Polynomials 3. Pair of Linear Equations in Two Variables
4. Quadratic Equations 5. Arithmetic Progressions 6. Triangles
7. Coordinate Geometry 8. Introduction to Trigonometry 9. Some Applications of Trigonometry
10. Circles 11. Constructions 12. Areas Related to Circles
13. Surface Areas And Volumes 14. Statistics 15. Probability

Content On This Page
Mean of Grouped Data Mode of Grouped Data Cumulative Frequency
Median of Grouped Data Relationship Between Mean, Median and Mode Graphical Representation of Cumulative Frequency Distribution
Estimation of Median (Graphically)


Chapter 14 Statistics (Concepts)

Welcome to this advanced chapter on Statistics, where we significantly deepen our analytical capabilities for interpreting numerical data, building substantially upon the concepts introduced in Class 9. While previous studies equipped us with basic data organization and representation, this chapter focuses on more sophisticated techniques, particularly for handling grouped data, and introduces precise methods for calculating measures that summarize the data's central characteristics. Our goal is to move beyond simple tabulation towards robust analysis and insightful interpretation.

We begin by briefly revisiting the foundational steps of data handling – distinguishing between primary and secondary data collection, and the crucial transition from raw data to organized formats. The utility of frequency distribution tables, both ungrouped and grouped, is reinforced. Key concepts associated with grouped data are solidified: defining non-overlapping class intervals, calculating class size ($h$), finding class marks ($x_i$) (the midpoints of intervals, $x_i = \frac{\text{Upper limit + Lower limit}}{2}$), and determining the range. Organizing large datasets into grouped frequency tables, often using tally marks during compilation (e.g., $\bcancel{||||}$ for 5 observations, $|||$ for 3), is essential for clarity and further analysis.

Graphical representation techniques are refined. While bar graphs are useful, our focus shifts to visuals suited for grouped continuous data:

The core of this chapter lies in calculating Measures of Central Tendency for grouped data, providing single values to represent the 'center' of the distribution:

Finally, we introduce Cumulative Frequency Distributions, compiled in 'less than' type (summing frequencies up to the upper limit of each class) and 'more than' type (summing frequencies from the lower limit downwards). Their graphical representations are called Ogives (cumulative frequency curves). The 'less than' ogive uses upper limits on the x-axis, while the 'more than' ogive uses lower limits. Ogives provide a visual way to estimate the median – either from the x-coordinate corresponding to $\frac{n}{2}$ on a single ogive or from the x-coordinate of the intersection point of the 'less than' and 'more than' ogives. The approximate empirical relationship, $\mathbf{Mode \approx 3 \times Median - 2 \times Mean}$, is also often noted as a useful check.



Mean of Grouped Data

In Class 9, you learned to calculate the mean for ungrouped data by summing all individual observations and dividing by the total count ($\overline{x} = \frac{\sum x_i}{n}$). However, when data is presented in a grouped frequency distribution, the exact value of each observation is lost; we only know the class interval it falls into. To handle this, we use methods that approximate the mean.

The core assumption for all these methods is that the observations within each class interval are centered around its midpoint. This midpoint, known as the class mark, is used as the representative value for all data points within that class.


Grouped Data and Class Mark

Grouped data is data summarised in a frequency table with class intervals. For example:

Marks (Class Interval) Number of Students (Frequency, $f_i$)
10 - 25 2
25 - 40 3
40 - 55 7

The class mark is the midpoint of a class interval, calculated as the average of its lower and upper limits. It acts as the single value representing that entire class.

$\text{Class Mark} (x_i) = \frac{\text{Lower Class Limit} + \text{Upper Class Limit}}{2}$

For the interval 10-25, the class mark is $\frac{10+25}{2} = 17.5$.


Methods for Finding the Mean of Grouped Data

1. Direct Method

This is the most straightforward method. It is a weighted average where each class mark ($x_i$) is weighted by its frequency ($f_i$). It is best used when the numerical values of $x_i$ and $f_i$ are small.

Formula:

$\overline{x} = \frac{\sum f_i x_i}{\sum f_i}$

Derivation of the Formula

The direct method is a logical extension of the basic mean formula. The mean is the sum of all observations divided by the number of observations.

For grouped data, we don't have individual observations. We make an assumption: every observation in a class is equal to its class mark, $x_i$.

If a class has a frequency of $f_i$, the sum of observations in that class is approximated as the class mark repeated $f_i$ times: $x_i + x_i + ... + x_i$ ($f_i$ times) $= f_i \times x_i$.

The total sum of all observations in the dataset is the sum of these products across all classes: $\sum f_i x_i$.

The total number of observations is simply the sum of all frequencies: $\sum f_i$.

Therefore, Mean $\overline{x} = \frac{\text{Total Sum of Observations}}{\text{Total Number of Observations}} = \frac{\sum f_i x_i}{\sum f_i}$.

2. Assumed Mean Method (Shortcut Method)

When values of $x_i$ are large, the product $f_i x_i$ can become difficult to calculate. The Assumed Mean Method simplifies this by choosing an 'assumed mean' ($a$) and working with smaller deviation values ($d_i$).

Formula:

$\overline{x} = a + \frac{\sum f_i d_i}{\sum f_i}$, where the deviation is $d_i = x_i - a$

Derivation of the Formula

This formula is derived algebraically from the Direct Method formula.

We know the deviation is defined as $d_i = x_i - a$. We can rearrange this to write the class mark as $x_i = a + d_i$.

Now, substitute this expression for $x_i$ into the Direct Method formula:

$\overline{x} = \frac{\sum f_i x_i}{\sum f_i} = \frac{\sum f_i (a + d_i)}{\sum f_i}$

Distribute $f_i$ inside the summation: $\overline{x} = \frac{\sum (f_i a + f_i d_i)}{\sum f_i}$.

The summation can be split over the addition: $\overline{x} = \frac{\sum f_i a + \sum f_i d_i}{\sum f_i}$.

Separate the fraction into two parts: $\overline{x} = \frac{\sum f_i a}{\sum f_i} + \frac{\sum f_i d_i}{\sum f_i}$.

Since the assumed mean $a$ is a constant, it can be factored out of the first summation: $\overline{x} = \frac{a \sum f_i}{\sum f_i} + \frac{\sum f_i d_i}{\sum f_i}$.

Cancel out $\sum f_i$ in the first term, which leaves us with the final formula:

$\overline{x} = a + \frac{\sum f_i d_i}{\sum f_i}$

3. Step-Deviation Method

This method further simplifies the Assumed Mean method, and is best used when all class intervals have a uniform width ($h$). It reduces the deviation values to even smaller, simpler integers ($u_i$).

Formula:

$\overline{x} = a + \left(\frac{\sum f_i u_i}{\sum f_i}\right) \times h$, where $u_i = \frac{x_i - a}{h}$

Derivation of the Formula

This formula is derived from the Assumed Mean formula.

The step-deviation is defined as $u_i = \frac{d_i}{h}$. We can rearrange this to write the deviation as $d_i = u_i h$.

Now, substitute this expression for $d_i$ into the Assumed Mean formula:

$\overline{x} = a + \frac{\sum f_i d_i}{\sum f_i} = a + \frac{\sum f_i (u_i h)}{\sum f_i}$

Since the class width $h$ is a constant for all classes, it can be factored out of the summation:

$\overline{x} = a + \frac{h \sum f_i u_i}{\sum f_i}$

Rearranging this gives the final formula for the Step-Deviation Method:

$\overline{x} = a + \left(\frac{\sum f_i u_i}{\sum f_i}\right) \times h$


Example 1. The following distribution shows the daily pocket allowance of children of a locality. Find the mean daily pocket allowance using a suitable method.

Daily Pocket Allowance (in ₹) Number of Children ($f_i$)
11 - 13 7
13 - 15 6
15 - 17 9
17 - 19 13
19 - 21 20
21 - 23 5
23 - 25 4

Answer:

Since the class intervals have a uniform width, the Step-Deviation method is the most efficient. We will demonstrate all three methods to show they yield the same result.

Solution 1: Direct Method

First, we find the class mark ($x_i$) for each interval and then calculate the product $f_i x_i$.

Pocket Allowance (₹) Frequency ($f_i$) Class Mark ($x_i$) $f_i x_i$
11 - 13 7 12 84
13 - 15 6 14 84
15 - 17 9 16 144
17 - 19 13 18 234
19 - 21 20 20 400
21 - 23 5 22 110
23 - 25 4 24 96
Total $\sum f_i = 64$ $\sum f_i x_i = 1152$

Using the formula $\overline{x} = \frac{\sum f_i x_i}{\sum f_i}$:

$\overline{x} = \frac{1152}{64} = 18$

... (1)

Solution 2: Assumed Mean Method

Let's choose an assumed mean, $a=18$. We calculate the deviations $d_i = x_i - 18$.

Allowance (₹) $f_i$ $x_i$ $d_i = x_i - 18$ $f_i d_i$
11-13 7 12 -6 -42
13-15 6 14 -4 -24
15-17 9 16 -2 -18
17-19 13 18 0 0
19-21 20 20 2 40
21-23 5 22 4 20
23-25 4 24 6 24
Total $\sum f_i = 64$ $\sum f_i d_i = 0$

Using the formula $\overline{x} = a + \frac{\sum f_i d_i}{\sum f_i}$:

$\overline{x} = 18 + \frac{0}{64} = 18$

... (2)

Solution 3: Step-Deviation Method

Let's choose $a=18$. The class width is uniform, $h = 13 - 11 = 2$. We calculate $u_i = \frac{x_i - a}{h}$.

Allowance (₹) $f_i$ $x_i$ $u_i = \frac{x_i-18}{2}$ $f_i u_i$
11-13 7 12 -3 -21
13-15 6 14 -2 -12
15-17 9 16 -1 -9
17-19 13 18 0 0
19-21 20 20 1 20
21-23 5 22 2 10
23-25 4 24 3 12
Total $\sum f_i = 64$ $\sum f_i u_i = 0$

Using the formula $\overline{x} = a + \left(\frac{\sum f_i u_i}{\sum f_i}\right) \times h$:

$\overline{x} = 18 + \left(\frac{0}{64}\right) \times 2 = 18 + 0 = 18$

... (3)

Answer: The mean daily pocket allowance is $\textsf{₹ } 18$. All three methods confirm this result.


Example 2. The following table gives the literacy rate (in percentage) of 35 cities. Find the mean literacy rate using the step-deviation method.

Literacy Rate (%) Number of Cities ($f_i$)
45-55 3
55-65 10
65-75 11
75-85 8
85-95 3

Answer:

To Find:

The mean literacy rate.

Solution (Step-Deviation Method):

The numerical values for class marks will be large (50, 60, 70, etc.), making the Step-Deviation method the most suitable.

First, we find the class marks ($x_i$). Then, we choose an assumed mean ($a$). Let's pick the middle value, $a=70$. The class width is uniform, $h = 55 - 45 = 10$.

Literacy Rate (%) Frequency ($f_i$) Class Mark ($x_i$) $u_i = \frac{x_i - 70}{10}$ $f_i u_i$
45-55 3 50 -2 -6
55-65 10 60 -1 -10
65-75 11 70 0 0
75-85 8 80 1 8
85-95 3 90 2 6
Total $\sum f_i = 35$ $\sum f_i u_i = -2$

Here, $\sum f_i = 35$, $\sum f_i u_i = (-6 - 10) + (8 + 6) = -16 + 14 = -2$, $a=70$, and $h=10$.

Using the Step-Deviation formula:

$\overline{x} = a + \left(\frac{\sum f_i u_i}{\sum f_i}\right) \times h$

$\overline{x} = 70 + \left(\frac{-2}{35}\right) \times 10$

$\overline{x} = 70 - \frac{20}{35} = 70 - \frac{4}{7}$

Now, $\frac{4}{7} \approx 0.57$.

$\overline{x} \approx 70 - 0.57 = 69.43$

... (1)

Answer: The mean literacy rate is approximately 69.43%.


Example 3. The mean of the following frequency distribution is 50. Find the value of the missing frequency, p.

Class Frequency ($f_i$)
0-20 17
20-40 p
40-60 32
60-80 24
80-100 19

Answer:

To Find:

The value of the missing frequency, p.

Given:

The mean of the distribution, $\overline{x} = 50$.

Solution (Direct Method):

When a frequency is missing, the Direct Method is often the most straightforward approach as it leads to a simple linear equation.

We create the calculation table:

Class Frequency ($f_i$) Class Mark ($x_i$) $f_i x_i$
0-20 17 10 170
20-40 p 30 $30p$
40-60 32 50 1600
60-80 24 70 1680
80-100 19 90 1710
Total $\sum f_i = 92+p$ $\sum f_i x_i = 5160+30p$

From the table, we have $\sum f_i = 17+p+32+24+19 = 92+p$.

And $\sum f_i x_i = 170 + 30p + 1600 + 1680 + 1710 = 5160 + 30p$.

Now, we use the Direct Method formula and the given mean, $\overline{x} = 50$.

$\overline{x} = \frac{\sum f_i x_i}{\sum f_i}$

$50 = \frac{5160 + 30p}{92 + p}$

Now, solve for p:

$50(92 + p) = 5160 + 30p$

$4600 + 50p = 5160 + 30p$

$50p - 30p = 5160 - 4600$

$20p = 560$

$p = \frac{560}{20} = 28$

... (1)

Answer: The value of the missing frequency, p, is 28.



Mode of Grouped Data

The mode is a measure of central tendency that represents the most frequently occurring value in a dataset. For ungrouped data, the mode is simply the observation with the highest frequency. For grouped data, we cannot identify a single value, but we can identify the class interval with the highest frequency. This class is called the modal class. The mode is then estimated as a specific value that lies within this modal class, calculated using a formula that accounts for the influence of the neighboring classes.


Formula for Mode of Grouped Data

To find the mode for grouped data, we first identify the modal class. Then, we use the following formula:

$\text{Mode} = l + \left(\frac{f_1 - f_0}{2f_1 - f_0 - f_2}\right) \times h$

where the terms are defined as follows:

If the modal class is the first class interval, then $f_0$ is taken as 0. If it is the last, $f_2$ is taken as 0.


Derivation of the Mode Formula (Graphical Method)

The formula for the mode is derived from a histogram of the frequency distribution. The mode is the value on the x-axis corresponding to the peak of the tallest bar (the modal class). Its exact position within the bar is influenced by the heights of the adjacent bars.

Histogram showing the modal class and adjacent classes. Similar triangles are used to derive the mode formula.

In the histogram above, the modal class is represented by the bar ABCD. The frequency of the preceding class is $f_0$ (bar PQRA) and the succeeding class is $f_2$ (bar CDEF).

To estimate the mode, we draw two lines: one from the top-right corner of the modal class (C) to the top-right corner of the preceding class (Q), and another from the top-left corner of the modal class (B) to the top-left corner of the succeeding class (F). The x-coordinate of their intersection point (K) is the mode.

Consider the two triangles $\triangle BKG$ and $\triangle CKH$. They are similar triangles.

Therefore, the ratio of their corresponding sides is equal:

$\frac{BG}{CH} = \frac{GK}{HK}$

From the graph, we can see:

Substituting these into the similarity equation:

$\frac{f_1 - f_0}{f_1 - f_2} = \frac{x}{h-x}$

Now, we solve for $x$:

$(f_1 - f_0)(h-x) = x(f_1 - f_2)$

$h(f_1 - f_0) - x(f_1 - f_0) = x(f_1 - f_2)$

$h(f_1 - f_0) = x(f_1 - f_2) + x(f_1 - f_0)$

$h(f_1 - f_0) = x(f_1 - f_2 + f_1 - f_0)$

$h(f_1 - f_0) = x(2f_1 - f_0 - f_2)$

$x = \frac{f_1 - f_0}{2f_1 - f_0 - f_2} \times h$

The Mode is the lower limit $l$ plus this distance $x$.

$\text{Mode} = l + x = l + \left(\frac{f_1 - f_0}{2f_1 - f_0 - f_2}\right) \times h$


Example 1. Find the mode of the following distribution:

Class Interval Frequency ($f$)
10 - 20 8
20 - 30 15
30 - 40 20
40 - 50 12
50 - 60 5
60 - 70 3

Answer:

To Find:

The mode of the given frequency distribution.

Solution:

Step 1: Identify the modal class.

The highest frequency is 20. The class corresponding to this frequency is 30 - 40. Therefore, this is the modal class.

Step 2: Identify the values for the formula.

  • Lower limit of modal class, $l = 30$
  • Class width, $h = 40 - 30 = 10$
  • Frequency of modal class, $f_1 = 20$
  • Frequency of preceding class, $f_0 = 15$
  • Frequency of succeeding class, $f_2 = 12$

Step 3: Substitute the values into the mode formula.

$\text{Mode} = l + \left(\frac{f_1 - f_0}{2f_1 - f_0 - f_2}\right) \times h$

$\text{Mode} = 30 + \left(\frac{20 - 15}{2(20) - 15 - 12}\right) \times 10$

$= 30 + \left(\frac{5}{40 - 27}\right) \times 10$

$= 30 + \left(\frac{5}{13}\right) \times 10 = 30 + \frac{50}{13}$

$\approx 30 + 3.85 = 33.85$

... (1)

Answer: The mode of the distribution is approximately 33.85.


Example 2. A survey conducted on 20 households in a locality by a group of students resulted in the following frequency table for the number of family members in a household. Find the mode of this data.

Family Size Number of Families ($f$)
1 - 3 7
3 - 5 8
5 - 7 2
7 - 9 2
9 - 11 1

Answer:

To Find:

The mode of the given data.

Solution:

Step 1: Identify the modal class.

The highest frequency is 8. The class corresponding to this frequency is 3 - 5. This is our modal class.

Step 2: Identify the values for the formula.

  • Lower limit of modal class, $l = 3$
  • Class width, $h = 5 - 3 = 2$
  • Frequency of modal class, $f_1 = 8$
  • Frequency of preceding class, $f_0 = 7$
  • Frequency of succeeding class, $f_2 = 2$

Step 3: Substitute the values into the mode formula.

$\text{Mode} = 3 + \left(\frac{8 - 7}{2(8) - 7 - 2}\right) \times 2$

$= 3 + \left(\frac{1}{16 - 9}\right) \times 2$

$= 3 + \left(\frac{1}{7}\right) \times 2 = 3 + \frac{2}{7}$

Now, $\frac{2}{7} \approx 0.286$.

$\text{Mode} \approx 3 + 0.286 = 3.286$

... (1)

Answer: The mode of the family size is approximately 3.286.



Cumulative Frequency

After organising data into a frequency distribution table, we can further analyse it by calculating cumulative frequencies. Cumulative frequency is a running total of frequencies. It helps us quickly see how many data points fall below or above a certain value. This is especially useful for finding medians and for creating graphical representations like ogives.


Types of Cumulative Frequency

The cumulative frequency of a class is the sum of its frequency and the frequencies of all classes preceding it. There are two ways to construct a cumulative frequency distribution:

1. Less Than Cumulative Frequency

This is the most common type. The cumulative frequency of a class represents the total number of observations with values less than the upper limit of that class. It answers the question, "How many data points are there up to this point?"

How to Calculate: You start with the frequency of the first class and progressively add the frequency of each subsequent class. The final cumulative frequency will be equal to the total number of observations ($N$).

2. More Than Cumulative Frequency

The cumulative frequency of a class represents the total number of observations with values greater than or equal to the lower limit of that class. It answers the question, "How many data points are there from this point onwards?"

How to Calculate: You start with the total number of observations ($N$) for the first class. Then, for each subsequent class, you subtract the frequency of the preceding class. The final entry will be the frequency of the last class itself.


Example 1. For the following frequency distribution, prepare both 'less than' and 'more than' cumulative frequency tables.

Class Interval Frequency ($f$)
0 - 10 5
10 - 20 8
20 - 30 10
30 - 40 12
40 - 50 15

Answer:

First, note that the total number of observations is $N = 5+8+10+ \ $$ 12+15 = 50$.

Part (a): 'Less Than' Cumulative Frequency Table

We create the table by adding frequencies cumulatively, using the upper limits of the class intervals.

Marks Cumulative Frequency (cf)
Less than 10 5
Less than 20 $5 + 8 = 13$
Less than 30 $13 + 10 = 23$
Less than 40 $23 + 12 = 35$
Less than 50 $35 + 15 = 50$
Part (b): 'More Than' Cumulative Frequency Table

We create the table by starting with the total frequency (50) and subtracting the frequency of each class, using the lower limits of the class intervals.

Marks Cumulative Frequency (cf)
More than or equal to 0 50
More than or equal to 10 $50 - 5 = 45$
More than or equal to 20 $45 - 8 = 37$
More than or equal to 30 $37 - 10 = 27$
More than or equal to 40 $27 - 12 = 15$

Answer: The 'less than' and 'more than' cumulative frequency distribution tables are shown above.


Example 2. A medical check-up of 35 students was conducted, and their weights were recorded. The 'less than' cumulative frequency distribution is given below. Convert this distribution into a standard grouped frequency distribution.

Weight (in kg) Number of Students (cf)
Less than 38 0
Less than 40 3
Less than 42 5
Less than 44 9
Less than 46 14
Less than 48 28
Less than 50 32
Less than 52 35

Answer:

To Prepare:

A standard grouped frequency distribution table.

Given:

A 'less than' cumulative frequency distribution table.

Solution:

To find the frequency of a particular class interval, we subtract the cumulative frequency of the preceding class from the cumulative frequency of the current class. The upper limits of the 'less than' series become the upper limits of our class intervals.

Weight (in kg) Frequency ($f$)
Below 38 0
38 - 40 $3 - 0 = 3$ (cf of <40 minus cf of <38)
40 - 42 $5 - 3 = 2$ (cf of <42 minus cf of <40)
42 - 44 $9 - 5 = 4$ (cf of <44 minus cf of <42)
44 - 46 $14 - 9 = 5$ (cf of <46 minus cf of <44)
46 - 48 $28 - 14 = 14$ (cf of <48 minus cf of <46)
48 - 50 $32 - 28 = 4$ (cf of <50 minus cf of <48)
50 - 52 $35 - 32 = 3$ (cf of <52 minus cf of <50)
Total $\sum f = 35$

The sum of the calculated frequencies is $0+3+2+4+ \ $$ 5+14+4+3=35$, which matches the total number of students. This confirms our calculation is correct.

Answer: The standard grouped frequency distribution is shown in the table above.



Median of Grouped Data

The median is the middlemost value in a dataset when arranged in order. It divides the data into two equal halves. For grouped data, we cannot pinpoint the exact middle value, but we can determine the median class—the class interval where the middle value lies. We then use a formula to estimate the median's precise value within that class.


Finding the Median Class

To find the median, we first need a 'less than' cumulative frequency (cf) distribution.

  1. Calculate the total number of observations, $N = \sum f_i$.
  2. Find the position of the median item, which is $\frac{N}{2}$.
  3. Locate the class interval whose cumulative frequency is the first to be greater than or equal to $\frac{N}{2}$. This interval is the median class.

Formula for Median of Grouped Data

Once the median class is identified, the median is calculated using the following interpolation formula:

$\text{Median} = l + \left(\frac{\frac{N}{2} - cf}{f}\right) \times h$

where:

Derivation of the Median Formula

The formula is derived by assuming that the frequencies are uniformly distributed within the median class. We use similar triangles from the 'less than' ogive (cumulative frequency polygon).

A 'less than' ogive curve showing the median class and the similar triangles used for deriving the median formula.

Let the median class have lower limit $l$. The cumulative frequency up to this class is $cf$, and the cumulative frequency at the end of this class is $cf+f$. The median position is at $\frac{N}{2}$ on the y-axis.

In the diagram, $\triangle A E B$ is similar to $\triangle A F C$. Therefore, the ratio of their corresponding sides is equal:

$\frac{EB}{FC} = \frac{AE}{AF}$

From the graph, we can identify the lengths of these sides:

Substituting these into the similarity equation:

$\frac{\text{Median} - l}{h} = \frac{\frac{N}{2} - cf}{f}$

Now, rearrange the formula to solve for the Median:

$\text{Median} - l = \left(\frac{\frac{N}{2} - cf}{f}\right) \times h$

$\text{Median} = l + \left(\frac{\frac{N}{2} - cf}{f}\right) \times h$


Example 1. Find the median of the following data:

Class Interval Frequency ($f$)
0-105
10-203
20-3010
30-4012
40-5015
50-608
60-702

Answer:

Solution:

Step 1: Create the cumulative frequency table.

Class Interval Frequency ($f$) Cumulative Frequency (cf)
0-1055
10-2038
20-301018
30-401230
40-501545
50-60853
60-70255
Total$N = 55$

Step 2: Identify the median class.

Total observations $N = 55$. The median position is $\frac{N}{2} = \frac{55}{2} = 27.5$.

The cumulative frequency just greater than 27.5 is 30. The corresponding class is 30 - 40, which is the median class.

Step 3: Identify the values for the formula.

  • $l = 30$
  • $N = 55$
  • $cf = 18$ (cf of the class before the median class)
  • $f = 12$
  • $h = 10$

Step 4: Substitute into the median formula.

$\text{Median} = 30 + \left(\frac{27.5 - 18}{12}\right) \times 10$

$= 30 + \left(\frac{9.5}{12}\right) \times 10 = 30 + \frac{95}{12}$

$\approx 30 + 7.92 = 37.92$

... (1)

Answer: The median of the distribution is approximately 37.92.


Example 2. If the median of the distribution given below is 28.5, find the values of x and y.

Class Interval Frequency ($f$)
0-105
10-20x
20-3020
30-4015
40-50y
50-605
Total60

Answer:

To Find:

The values of the missing frequencies, x and y.

Solution:

Step 1: Set up the cumulative frequency table and form an equation.

Class Interval Frequency ($f$) Cumulative Frequency (cf)
0-1055
10-20x$5+x$
20-3020$25+x$
30-4015$40+x$
40-50y$40+x+y$
50-605$45+x+y$

We are given that the total frequency is 60. From our table, the final cumulative frequency is $45+x+y$. Therefore:

$45 + x + y = 60$

$x + y = 15$

... (1)

Step 2: Use the given median to find the median class.

The given median is 28.5. This value lies in the class interval 20 - 30. Therefore, this is our median class.

Step 3: Identify the values for the median formula.

  • $l = 20$
  • $N = 60 \implies \frac{N}{2} = 30$
  • $cf = 5+x$ (cf of the class before the median class)
  • $f = 20$
  • $h = 10$

Step 4: Substitute into the median formula and solve for x.

$28.5 = 20 + \left(\frac{30 - (5+x)}{20}\right) \times 10$

$28.5 - 20 = \left(\frac{30 - 5 - x}{20}\right) \times 10$

$8.5 = \left(\frac{25 - x}{20}\right) \times 10$

Cancel 10 into 20:

$8.5 = \frac{25 - x}{2}$

$8.5 \times 2 = 25 - x$

$17 = 25 - x \implies x = 25 - 17 = 8$

... (2)

Step 5: Find y using equation (1).

Substitute $x=8$ into $x+y=15$:

$8 + y = 15 \implies y = 15 - 8 = 7$

... (3)

Answer: The values of the missing frequencies are x = 8 and y = 7.



Relationship Between Mean, Median and Mode

The mean, median, and mode are the three most common measures of central tendency. While they all aim to describe a 'typical' value in a dataset, they do so in different ways and their values can differ, especially in datasets that are not symmetrical. The relationship between these three measures provides insights into the shape and distribution of the data.


Relationship Based on Distribution Shape

The shape of a data distribution, often visualized with a frequency curve or histogram, is called its skewness. Skewness describes the extent to which a distribution is asymmetrical.

The Empirical Relationship

For distributions that are moderately skewed (not perfectly symmetrical but also not extremely skewed), Karl Pearson observed an approximate relationship between the three measures. This is known as the empirical formula.

Primary Formula:

$\text{Mode} = 3 \times \text{Median} - 2 \times \text{Mean}$

This formula is an approximation and works best for unimodal, moderately skewed distributions. It is very useful for estimating one measure of central tendency if the other two are known.

Variations of the Formula:

The primary formula can be rearranged to find the Mean or Median:

  1. To find the Median:

    $3 \times \text{Median} = \text{Mode} + 2 \times \text{Mean}$

    $\text{Median} = \frac{\text{Mode} + 2 \times \text{Mean}}{3}$

  2. To find the Mean:

    $2 \times \text{Mean} = 3 \times \text{Median} - \text{Mode}$

    $\text{Mean} = \frac{3 \times \text{Median} - \text{Mode}}{2}$


Example 1. In a moderately skewed distribution, the mean is 25 and the median is 26. Find the approximate mode.

Answer:

To Find:

The approximate mode of the distribution.

Given:

Mean = 25, Median = 26.

Solution:

We use the empirical formula to estimate the mode.

$\text{Mode} = 3 \times \text{Median} - 2 \times \text{Mean}$

Substitute the given values:

$\text{Mode} = 3(26) - 2(25)$

$= 78 - 50 = 28$

... (1)

Answer: The approximate mode of the distribution is 28.

(Note: Since Mean (25) < Median (26) < Mode (28), this distribution is negatively skewed).


Example 2. For a certain distribution, the mean is 30 and the mode is 24. Estimate the median.

Answer:

To Find:

The approximate median of the distribution.

Given:

Mean = 30, Mode = 24.

Solution:

We use the rearranged empirical formula to find the median.

$\text{Median} = \frac{\text{Mode} + 2 \times \text{Mean}}{3}$

Substitute the given values:

$\text{Median} = \frac{24 + 2(30)}{3}$

$= \frac{24 + 60}{3} = \frac{84}{3}$

$= 28$

... (1)

Answer: The approximate median of the distribution is 28.

(Note: Since Mode (24) < Median (28) < Mean (30), this distribution is positively skewed).


Example 3. If the mode of a distribution is 8 and its median is 9, what will be its mean?

Answer:

To Find:

The approximate mean of the distribution.

Given:

Mode = 8, Median = 9.

Solution:

We use the rearranged empirical formula to find the mean.

$\text{Mean} = \frac{3 \times \text{Median} - \text{Mode}}{2}$

Substitute the given values:

$\text{Mean} = \frac{3(9) - 8}{2}$

$= \frac{27 - 8}{2} = \frac{19}{2}$

$= 9.5$

... (1)

Answer: The approximate mean of the distribution is 9.5.



Graphical Representation of Cumulative Frequency Distribution (Ogives)

A cumulative frequency distribution can be represented graphically by a smooth curve known as an ogive (pronounced 'o-jive'). Ogives are powerful visual tools that help in understanding the distribution of data and are particularly useful for determining the median graphically.

There are two types of ogives, corresponding to the two types of cumulative frequency distributions:


Drawing a 'Less than' Ogive

A 'less than' ogive is a graph of the 'less than' cumulative frequency against the upper class limits. It shows the number of observations that are less than a particular value.

Steps to Draw a 'Less than' Ogive:

  1. Prepare a 'less than' cumulative frequency table.
  2. On a graph, mark the upper class limits on the horizontal axis (x-axis) and the corresponding cumulative frequencies on the vertical axis (y-axis).
  3. To anchor the curve to the x-axis, plot a point corresponding to the lower limit of the first class with a cumulative frequency of 0.
  4. Join the plotted points with a smooth, freehand curve.

Drawing a 'More than' Ogive

A 'more than' ogive is a graph of the 'more than' cumulative frequency against the lower class limits. It shows the number of observations that are greater than or equal to a particular value.

Steps to Draw a 'More than' Ogive:

  1. Prepare a 'more than' cumulative frequency table.
  2. On a graph, mark the lower class limits on the horizontal axis (x-axis) and the corresponding cumulative frequencies on the vertical axis (y-axis).
  3. To anchor the curve to the x-axis, plot a point corresponding to the upper limit of the last class with a cumulative frequency of 0.
  4. Join the plotted points with a smooth, freehand curve.

Finding the Median from Ogives

Ogives provide an excellent graphical method for estimating the median of grouped data.

  1. Using a Single ('Less than') Ogive:
    • Calculate $N/2$, where $N$ is the total frequency.
    • Locate the value $N/2$ on the vertical (y-axis).
    • From this point, draw a horizontal line to intersect the 'less than' ogive curve.
    • From the point of intersection, draw a perpendicular line down to the horizontal (x-axis).
    • The point where this perpendicular line meets the x-axis is the median of the data.
  2. Using Both Ogives:
    • Draw both the 'less than' and 'more than' ogives on the same graph paper.
    • The two curves will intersect at a point.
    • From this point of intersection, draw a perpendicular line down to the x-axis.
    • The x-coordinate of the intersection point is the median of the data.

Example 1. The following distribution gives the daily income of 50 workers of a factory. Draw both 'less than' and 'more than' ogives for this data on the same graph and find the median from the graph.

Daily Income (in ₹) Number of Workers ($f$)
100 - 12012
120 - 14014
140 - 1608
160 - 1806
180 - 20010

Answer:

Solution:

Step 1: Prepare the cumulative frequency tables.

The total number of workers is $N=50$.

Income (Less than) CF
Less than 12012
Less than 140$12+14=26$
Less than 160$26+8=34$
Less than 180$34+6=40$
Less than 200$40+10=50$
Income (More than or equal to) CF
More than or equal to 10050
More than or equal to 120$50-12=38$
More than or equal to 140$38-14=24$
More than or equal to 160$24-8=16$
More than or equal to 180$16-6=10$

Step 2: Plot the points for both ogives.

  • 'Less than' points: (100, 0), (120, 12), (140, 26), (160, 34), (180, 40), (200, 50).
  • 'More than' points: (100, 50), (120, 38), (140, 24), (160, 16), (180, 10), (200, 0).

Step 3: Draw the ogives and find the median.

We plot these points on a graph. The x-axis represents 'Daily Income (in ₹)' and the y-axis represents 'Cumulative Frequency (Number of Workers)'.

Graph showing both 'less than' and 'more than' ogives intersecting. A perpendicular from the intersection point to the x-axis indicates the median.

The two ogives intersect at a point. From this intersection point, we draw a perpendicular to the x-axis. The point where it meets the x-axis is the median income.

From the graph, the perpendicular from the intersection point meets the x-axis at approximately 138.6.

We can also find the median from the 'less than' ogive alone. We locate $N/2 = 50/2 = 25$ on the y-axis, draw a horizontal line to the 'less than' curve, and then a vertical line to the x-axis, which also gives approximately 138.6.

Answer: The median daily income is approximately $\textsf{₹ } 138.60$.


Example 2. During the medical check-up of 35 students of a class, their weights were recorded as follows. Draw a 'less than' ogive for the given data. Hence, obtain the median weight from the graph.

Weight (in kg) Number of Students (cf)
Less than 380
Less than 403
Less than 425
Less than 449
Less than 4614
Less than 4828
Less than 5032
Less than 5235

Answer:

Solution:

Step 1: Identify the points to plot.

The data is already in the 'less than' cumulative frequency format. The points to be plotted are (Upper Limit, CF).

The points are: (38, 0), (40, 3), (42, 5), (44, 9), (46, 14), (48, 28), (50, 32), (52, 35).

Step 2: Draw the 'less than' ogive.

Plot the points on a graph with 'Weight (in kg)' on the x-axis and 'Cumulative Frequency' on the y-axis.

A 'less than' ogive for student weights. A line from N/2 on the y-axis intersects the curve, and a perpendicular from there gives the median on the x-axis.

Step 3: Find the median from the graph.

The total number of students is $N=35$. The median position is $N/2 = 35/2 = 17.5$.

Locate 17.5 on the y-axis. Draw a horizontal line from this point to the ogive curve. From the intersection point on the curve, draw a vertical line down to the x-axis. The value on the x-axis is the median.

From the graph, the median weight is approximately 46.5 kg.

Answer: The median weight obtained from the ogive is approximately 46.5 kg.



Estimation of Median (Graphically)

Cumulative frequency curves, or ogives, provide a powerful visual method for estimating the median of a grouped data distribution. This graphical technique complements the formula-based calculation and helps in visualizing where the middle value of the data lies.


Method 1: Using a Single 'Less than' Ogive

This is the most common graphical method. It relies on the definition of the median as the value of the $\frac{N}{2}$-th observation.

Steps:

  1. Draw a 'less than' ogive for the given data.
  2. Calculate the median position by finding $\frac{N}{2}$, where $N$ is the total frequency.
  3. Locate this value, $\frac{N}{2}$, on the vertical (Cumulative Frequency) axis.
  4. From this point on the y-axis, draw a horizontal line parallel to the x-axis until it intersects the ogive curve.
  5. From the point of intersection on the curve, draw a vertical line downwards until it meets the horizontal (x-axis).
  6. The point where the vertical line meets the x-axis is the estimated median of the data.
A 'less than' ogive. A line from N/2 on the y-axis intersects the curve, and a perpendicular from there gives the median on the x-axis.

Method 2: Using Both 'Less than' and 'More than' Ogives

This method provides a clear visual confirmation of the median by finding the point where the two types of ogives cross. The median is the only point that has the same number of data points below it and above it.

Steps:

  1. Draw both the 'less than' ogive and the 'more than' ogive on the same graph with the same scale.
  2. Identify the point where the two curves intersect.
  3. From this intersection point, draw a perpendicular line down to the x-axis.
  4. The x-coordinate of this point on the x-axis is the estimated median.

Why this works: The 'less than' ogive shows the count of observations below a value, and the 'more than' ogive shows the count of observations above it. The point where they intersect is the value where the count of observations below is equal to the count of observations above, which is the definition of the median. The y-coordinate of this intersection point will always be $\frac{N}{2}$.

Graph showing both 'less than' and 'more than' ogives intersecting. A perpendicular from the intersection point to the x-axis indicates the median.

Example 1. The following distribution gives the daily income of 50 workers of a factory. Draw 'less than' and 'more than' ogives on the same graph and hence find the median income.

Daily Income (in ₹) Number of Workers ($f$)
100 - 12012
120 - 14014
140 - 1608
160 - 1806
180 - 20010

Answer:

To Find:

The median income by drawing both 'less than' and 'more than' ogives.

Solution:

Step 1: Prepare the cumulative frequency tables.

The total number of workers is $N=50$. The median position is $N/2 = 25$.

Income (Less than) CF
Less than 12012
Less than 14026
Less than 16034
Less than 18040
Less than 20050
Income ($\ge$) CF
More than or equal to 10050
More than or equal to 12038
More than or equal to 14024
More than or equal to 16016
More than or equal to 18010

Step 2: Identify the points and draw the ogives.

  • 'Less than' points: (100, 0), (120, 12), (140, 26), (160, 34), (180, 40), (200, 50).
  • 'More than' points: (100, 50), (120, 38), (140, 24), (160, 16), (180, 10), (200, 0).

We plot these points on a graph and join them with smooth curves.

Graph showing both 'less than' and 'more than' ogives intersecting at a point. A perpendicular from the intersection to the x-axis gives the median income.

Step 3: Determine the median from the graph.

The two ogives intersect at a point. We draw a vertical line from this intersection point down to the x-axis. The value at this point on the x-axis is the median.

From the graph, the x-coordinate of the intersection point is approximately 138.6.

Verification by Formula

The median position is $N/2 = 25$. The cumulative frequency just greater than 25 is 26, which corresponds to the median class 120 - 140.

  • $l=120$, $N=50$, $cf=12$, $f=14$, $h=20$.

$\text{Median} = 120 + \left(\frac{25 - 12}{14}\right) \times 20$

$= 120 + \frac{13}{14} \times 20 = 120 + \frac{260}{14} \approx 120 + 18.57 = 138.57$

The value from the graph (138.6) is very close to the calculated value (138.57), confirming our result.

Answer: The median daily income is approximately $\textsf{₹ } 138.60$.